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(54) A method and system for suggesting related documents 



(57) The document reading system passively ana- 
lyzes a document to generate margin or end notes of 
references to other documents that relate to annotated 
passages in the document or to the entire document 
The invention is responsive to the annotation of a doc- 
ument to passively generate a query that retrieves doc- 
uments that have similar content to the annotated pas- 
sage The retrieved documents are available to the 



reader through selectable links placed in the margin 
near the annotation. Additionally, the invention provides 
end notes with links to documents that are similar in con- 
tent to the overall content of the annotated document. 
The invention assists the reader by passively generating 
selectable links to related documents to assist the user 
in relating the new document to previously read materi- 
al. 
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Description 

[0001] This invention relates generally to electronic 
document reading systems. In particular, this invention 
is directed to an electronic document reading system 
that suggests other related documents when displaying 
a first document. 

[0002] Retrieving documents similar to a document 
identified by the user as being related is known as rele- 
vance feedback. Relevance feedback is described in 
"Introduction to Modern Information Retrieval", G. Salt- 
on et al., McGraw Hill, (1983), incorporated herein by 
reference in its entirety. Interfaces that support rele- 
vance feedback conventionally require explicit action on 
the part of the reader and do not spontaneously offer 
suggestions of relevant documents. Information explo- 
ration interfaces designed lor window-based computing 
environments typically present search results for other 
relevant documents via lists in a separate window or by 
replacing the visible document with the search results. 
These systems are very intrusive and interrupt the read- 
ing process. 

[0003] Hypertext interfaces display links to docu- 
ments relevant to a source document either by providing 
a margin that contains' the links or by embedding the 
links in the text of the source document in the manner 
pioneered by "Hyperties." This system is described in 
"User Interface Design for the Hyperties Electronic En- 
cyclopediaV by ^hnsirierman. Proceedings of Hypertext 
■87 , November 1987, Chapel Hill, NC, incorporated 
herein by reference in its entirety. However, these links 
are static and are created along with the source docu- 
ment by the hypertext author. Some systems, such as 
Trellis, display links dynamically, but only from a fixed 
set of previously-defined links. Trellis is described in 
"Programmable Browsing Semantics and Trellis", by R. 
Furuta et al. Proceedings of Hypertext '89. November 
1989, Pittsburgh, PA, ACM Press, incorporated herein 
by reference in its entirety. 

[0004] The HieNet System uses inter-node similarity 
measures to create hypertext links based on links pre- 
viously created by the hypertext author. This system is 
described in "Hienet: A User-Centered Approach for Au- 
tomatic Link Generation", D.T. Chang, Proceedings of 
Hypertext '93 , November 1993, Seattle, WA, ACM 
Press, incorporated herein by reference in its entirety. 
When the author creates a link from a document A to a 
document B, the system automatically adds links from 
all documents similar to document A to all documents 
simitar to document B. Anchors for these automatically- 
generated links are represented by icons in the margin 
of the various documents. Clicking on an icon displays 
a pop-up menu that contains a list of possible destina- 
tion documents that are ranked by relevance to the que- 
ry. Again, this system relies on links previously created 
by the author. 

[0005] Other conventional systems relate to hyper- 
text-like ways of displaying search results. HieNet dis- 



plays automatic links in the margin, but anchors in the 
margin are not relevant to the content of the passage 
adjacent to the anchor. HieNet does not distinguish be- 
tween document-document and passage-document 
s links. Furthermore, HieNet does not indicate the number 
and nature of the documents reachable through the 
margin links. 

[0006] Visualization of Information Retrieval System 
(hereinafter VOIR) is described in "Queries? Links? Is 

io There a Difference?", Proceedings of CHI '97, G. 
Golovinsky, March 1997, Atlanta, GA, ACM Press and 
in "What the Query Told the Link: The Integration of Hy- 
pertext and Information Retrieval". Proceeding s of Hy- 
pertext '97 , G. Golovinsky, April 1997, Southhampton, 

75 UK, ACM Press, each incorporated herein by reference 
in its entirety. VOIR is a mechanism that dynamically 
creates and resolves hypertext links with queries that 
are computed from the text surrounding a selected an- 
chor. VOIR uses queries to retrieve sets of documents 

20 that are related to the passage containing the selected 
anchor. VOIR does not show the user links that have 
pre-established relationships. Rather, to submit a query 
and to establish a relationship, the user has to pause 
and select an anchor. VOIR was designed specifically 

25 to support interactive information exploration, rather 
than to facilitate the reading process. Thus, VOIR's fo- 
cus is supporting navigation between documents. The 
user is thus expected to devote much cognitive effort to 
browsing. Furthermore, VOIR does not permit the user 

30 to annotate or tag documents. VOIR also does not indi- 
cate which link was selected to generate a particular dis- 
play 

[0007] A background information retrieval process 
called the Remembrance Agent (hereinafter RA) is de- 

35 scribed in u A Continuously Running Automated Informa- 
tion Retrieval System", B.J. Rhodes et al. Proceedings 
of The First International Conference on the Practical 
Application of Intelligent Agents in Multi-Agent Technol- 
ogy, PA AM '96, April, 1 997, London, UK, incorporated 

40 herein by reference in its entirety. RA operates in an 
EMACS text window and suggests documents related 
to the last few lines of text typed by the user. RA is de- 
signed to search through a user's private data to sug- 
gest documents related to the text being typed. Howev- 

45 er, these suggestions are ephemeral and relate only to 
text that is currently being written. RA does not support 
reading tasks because it continuously replaces sugges- 
tions as the user edits the document. 
[0008] QRL is a query-based information exploration 

so interface that uses ink-like marks on text to specify 
boolean queries. This system is described in "Queries- 
R-Links: Graphical Markup for Text Navigation", by G. 
Golovchinsky et al.. Proceedings of I NTERC HI '93 , April 
1993, Amsterdam, The Netherlands, ACM Press, incor- 

ss porated herein by reference in its entirety. Query terms 
are selected with rectangles. Lines connect the rectan- 
gles to represent boolean AND operators. 
[0009] All of these systems require extensive user in- 
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teraction to generate links to related documents or only 
support writing. An electronic document reading system 
is needed that passively and unobtrusively generates 
links to related documents to support reading. 
[001 0] This invention provides a method and a system 
for passively showing the reader related documents 
without interfering with the reading process. 
[0011] The invention further provides intuitive support 
for reading by automatically detecting documents po- 
tentially of interest to the reader based on the reader's 
interaction with the source document being read. When 
people read text, they often make annotations to high- 
light interesting or controversial passages and terms. 
The presence or relative density of such marks and 
scribbles may be used as an indicator of the relative in- 
terest that the reader has in a particular passage. When 
a large body of documents related to the document be- 
ing read is available., the reader may be interested in 
finding related documents as part of the reading proc- 
ess. 

[0012] References to documents related to specific 
passages of interest to the user are placed in the source 
document's margins and references to documents sim- 
ilar overall to the source document are inserted as end 
notes. The system and method of this invention maintain 
the links once they have been identified to facilitate non- 
linear reading and skimming. 

[001 3] A user's interests are inferred from annotations 
made while reading the source document. Therefore, 
the system and method of this invention minimize cog- 
nitive overhead in two ways: 1) no expressive query is 
required to identify documents related to the source doc- 
ument: and 2) selectable links to the related documents 
are provided unobtrusively in the margins and at the end 
of the document, this is shown in Figs. 2 and 3, respec- 
tively. 

[001 4] The system also introduces suggestions to the 
reader in a manner compatible with other interactions, 
rather than burdening the user with modal dialogues. 
Suggested documents are accessible by following the 
selectable links. However, the user does not have to act 
on a suggestion when it is made. Rather, the user can 
act on the suggestion when (or if) it makes sense to do 
so. The system and method of this invention represent 
the type of the referenced document with an icon and 
provide a textural label to the icon to give users a better 
understanding of the target of the link. 
[0015] These and other features and advantages of 
this invention are described in or apparent from the fol- 
lowing detailed description of the preferred embodi- 
ments. 

[0016] The preferred embodiments of this invention 
will be described in detail, with reference to the following 
figures, wherein: 

Fig. 1 is a block diagram of one embodiment of the 
electronic document reading system of this inven- 
tion: 



Fig. 2 shows a source document having an icon in 
the margin adjacent to an annotated passage: 
Fig. 3 shows another source document having an 
endnote; and 

5 Fig. 4 is a flowchart outlining a control routine for 
one embodiment of this invention. 

[0017] Fig. 1 shows a block diagram of one embodi- 
ment of a document reading system 10 according to this 

io invention. The document reading system 10 includes a 
processor 1 2 communicating with a first memory 1 4 that 
stores a source document 16 that is currently being read 
by a user on a display 18. The processor 12 also com- 
municates with a second memory 20 that stores poten- 

is tially related target documents 22. A user interacts and 
controls the document reading system 10 through any 
number of conventional input/output devices 24, such 
as a mouse 26, a keyboard 28, or a pen-based interface 
30. The input/output devices 24 communicate with an 

20 input/output interface 31 that, in turn, communicates 
with the processor 1 2. 

[0018] As shown in Fig. 1 , the system 10 is preferably 
implemented on a programmed general purpose com- 
puter. However, the system 10 can also be implemented 

2B using a special purpose computer a programmed mi- 
croprocessor or microcontroller and any necessary pe- 
ripheral integrated circuit elements, an ASIC or other in- 
tegrated circuit, a hardwired electronic or logic circuit 
such as a discrete element circuit, a programmable logic 

30 device such as a PLD, PLA, FPGA or PAL, or the like. 
In general, any device on which a finite state machine 
capable of implementing the flowchart shown in Fig. 4 
can be used to implement the system 10. 
[0019] Additionally, as shown in Fig. 1 , the storage de- 

35 vices or memories 1 4 and 20 are preferably implement- 
ed using static or dynamic RAM. However, the devices 

14 and 20 can also be implemented using a floppy disk 
and disk drive, a writable optical disk and disk drive, a 
hard drive, flash memory or the like. Also, it should be 

40 appreciated that the devices 14 and 20 can be either 
distinct portions of a single memory or physically distinct 
memories. 

[0020] Further, it should be appreciated that the links 

15 and 17 connecting the devices 14 and 20 and the 
45 processor 1 2 can be a wired or wireless link to a network 

(not shown). The network can be a local area network, 
a wide area network, an intranet, the Internet or any oth- 
er distributed processing and storage network. In this 
case, the electronic document 16 is pulled from and 

so physically remote memory device 14 through link 15 for 
processing in the processor 12 according to the method 
outlined below. In this case, the electronic document 16 
can be stored locally in portion of some other memory 
device of the system 10 (not shown). 

55 [0021] The method of this invention identifies two 
kinds of target documents 22 for each source document 
16. The two types of target documents are: 1 ) target doc- 
uments that are specifically related to annotated pas - 
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sages; and 2) documents that are generally related to 
the overall source document. Once a relationship is es- 
tablished between the source document and the target 
documents 22, the target documents may be displayed 
by clicking on selectable links in the displayed document s 
16. 

[0022] References to the two types of target docu- 
ments 22 is shown in Fig. 2. A target document 22 re- 
lated to the specific passage 32 in the source document 
16 is identified by a margin representation 34 placed in 10 
the margin of the source document 16 near the related 
passage 32. As shown in Fig. 3, a target document 22 
that is related to the source document 16 as a whole is 
annotated and shown as an endnote 36 to the source 
document. The end note 36 includes the type, the title is 
and summary information. 

[0023] Fig. 4 is a flowchart outlining a control routine 
for one embodiment of the method of this invention. Be- 
ginning in step S100, the control routine continues to 
step S1 05. In step S1 05, the control routine determines 20 
if the user has made any annotations. If not, control 
loops back to step S105. If so control continues to step 
S110. In step S110 : the control routine determines the 
annotation of the source document mode by the user. 
Next : in step S1 20, the control routine analyzes the text 25 
of the source document and the annotation to determine 
the passage being annotated. A passage may include 
a paragraph marked with a margin bar, an underlying 
sentence or phrase, or the context of one or more circled 
terms. Then in step S1 30, the control routine generates 30 
a query from the passage. The query includes content- 
bearing terms from the identified passage that are 
weighted to give importance to any circled words. Next, 
in step SI 40 the control routine searches the target doc- 
ument using the query to identify documents that are 35 
related to the passage. Then, at step Si 50, the search 
results are clustered. Clustering is preferably performed 
in a manner similar to that described in "Reexamining 
the Cluster Hypothesis: Scatter/Gather on Retrieval Re- 
sults', M.A. Hearst et al., Proceedings of ACM SIGIR 40 
'96, August 1996, Zurich, Switzerland, incorporated 
herein by reference in its entirety. 
[0024] Next, in step S160, the control routine selects 
a typical document from each cluster. These documents 
are further filtered by a user-specified similarity thresh- 45 
old in step Si 70. Then, in step S180 : the remaining doc- 
uments are identified by displaying links to those docu- 
ments in the margin of the source document adjacent to 
the passage from which the query was generated. Each 
selectable link may be an icon representing a type of the so 
selected and filtered target document and a short title. 
[0025] Next, in step SI 90, the control routine deter- 
mines if a user has selected a selectable link in the cur- 
rent source document. If in step S190, a user has se- 
lected a selectable link, the control routine proceeds to ss 
step S200. In step S200, the target document is dis- 
played as the new current source document, control 
then continues back to step S1 05, where it waits for an- 



other annotation to be made. Alternatively, if in step 
S190, no selectable link is selected, then the control 
jumps directly back to step S105. The control routine 
continues until the user has closed all open source doc- 
uments 16 displayed on the display 18. 
[0026] To compute end notes the flowchart of Fig. 4 
can be used with slight modifications. The control rou- 
tine proceeds identically as directed for the creation of 
margin notes from step S100 through step S120. How- 
ever, at step S130 a weighted sum query is generated. 
In step S130 terms that are explicitly identified by the 
reader and terms identified by standard relevance feed- 
back techniques are used to construct weighted-sum 
queries at step S1 30. The identified terms are assigned 
weights based upon the annotations made to the docu- 
ment. For instance, words that have been expressly se- 
lected by the user are weighted the highest and words 
that occur in selected paragraphs are weighted higher 
than the remaining terms of the source document. 
[0027] Documents that have been identified as relat- 
ed to the document using the weighted sum query gen- 
erated in step S130 are processed in a manner similar 
to the remaining steps S140 through S200 with the ex- 
ception that the link is displayed as an end note in step 
SI 80 rather than as a margin note. 
[0028] It should be understood that either or both of 
these control routines may be running in the background 
of a document reading system of the invention. 
[0029] Optionally, the system and method of this in- 
vention may derive summaries from documents through 
an automatic text summarization process in a manner 
similar to that described in "A Trainable Document Sum- 
marized, J. Kupiec et al Proceedings of SIGIR '95. July 
1995, Pittsburgh, PA, ACM Press, incorporated herein 
by reference in its entirety. The summaries are then dis- 
played as end notes. 

[0030] It is to be understood that the term annotation 
as used herein is intended to include text, digital ink, 
audio, video or any other input associated with a docu- 
ment. It is also to be understood that the term document 
is intended to include text, video, audio and any other 
media and any combination of media. Further, it is to be 
understood that the term text is intended to include text, 
digital ink, audio, video or any other content of a docu- 
ment to include the document's structure. 



Claims 

1. A method for displaying in a display of a first docu- 
ment, at least one link to another document, each 
other document being related to the first document, 
the method comprising: 

identifying at least one user annotated segment 
of the first document; 

identifying at least one second document that 
is related to the at least one annotated segment 
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2. 



4. 



of the first document; and 

displaying in the first document a selectable link 

for each second document. 

The method of claim 1, wherein the selectable link 
is displayed as an end note to the first document. 

The method of claim 1 or claim 2, the step of iden- 
tifying the at least one second document comprising 
identifying at least one portion of the at least one 
second document as related to the at least one an- 
notated segment, the selectable link referencing the 
identified at least one portion, the selectable link be- 
ing displayed in the margin adjacent to the at least 
one annotated segment and the step of identifying 
being in response to the annotation of the at least 
one segment of the first document. 

The method of any one of claims 1 to 3, wherein the 
step of identifying the at least one second document 
comprises determining the relatedness based upon 
user identified terms and terms identified using rel- 
evance feedback techniques. 

The method of any one of claims 1 to 4 : further com- 
prising the steps of: 



6. 



is responsive to the annotation of a segment of the 
first document by the user to identify the at least one 
second document. 

10. The system of any one ol claims 6 to 9, further com- 
prising a user interface, wherein the display is re- 
sponsive to the selection of the selectable link by 
the user to display the identified at least one second 
document. 
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determining if the selectable link has been se- 
lected; and 

displaying the identified at least one second 30 
document in response to the selection of the 
selectable link. 



9. 



An electronic document system for suggesting in a 
display of a first document at least one second doc- 
ument that is related to the first document, the sys- 
tem comprising: 

a processor that identifies at least one user an- 
notated segment of the first document and that 
identifies at least one second document as re- 
lated to the annotated segment ol the first doc- 
ument; and 

a display that displays a selectable link that ref- 
erences the identified at least one second doc- 
ument in a display of the first document. 

The system of claim 6, wherein the selectable link 
is displayed as an end note to the first document. 

The system of any one of claims 6 and 7 : wherein 
the processor identifies the at least one second doc- 
ument based upon user identified terms and terms 
identified based upon relevance feedback tech- 
niques. 

The system of any one of claims 6 to 6, further com- 
prising a user input interface, wherein the processor 
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reader through selectable links placed in the margin 
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